Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fixes, upgrade quickcheck #640

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

PetrGlad
Copy link
Contributor

  1. quickcheck 1.x input values vary more than in 0.9 which finds overflow cases and excessive memory allocation problems in Sample::lerp. Calculations now use f64 to avoid that. Conversion back to 16 bit resolution is done explicitly since as conversion may silently discard most significant bits. Alternatively, interpolation coefficient (numerator/dividor) may use e.g. u16 instead of u32, or maybe even floating point. I believe u32 precision is unnecessary there. But that requires updating sample rate converter which at the moment is the only user of this interpolation.
  2. Documentation for Sample::lerp was incorrect. It said that calculations should follow c * first + (c - 1) * second. which gives first at c==1 and second at c==0 but actual implementations use the opposite first at c==0 and second at c==1.
  3. Fix documentation examples. Some of the examples are marked as no_run since they require audio devices and may actually make sounds while cargo test command is running.
  4. Include documentation tests into CI. --all-targets switch excludes doc tests, so those are executed separately.
  5. Non experimental builds exclude some code and in some code parts use mutually exclusive implementations, so I included those in CI too to keep experimental code compilable.

src/conversions/sample.rs Show resolved Hide resolved
src/conversions/sample_rate.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@dvdsk dvdsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fixes and improvements 👍 , thank you very much :) . This must have taken a while!

Few things I am unsure about as you'll see in the discussion. Its a bit too late for me to check on the performance impact of the try_from. Ill see if I can add a new interpolate benchmark to rodio tomorrow.

examples/automatic_gain_control.rs Outdated Show resolved Hide resolved
src/buffer.rs Outdated Show resolved Hide resolved
src/conversions/sample.rs Show resolved Hide resolved
src/conversions/sample.rs Show resolved Hide resolved
src/conversions/sample_rate.rs Outdated Show resolved Hide resolved
Restoring len() test, it is actually useful to check
`SampleRateConverter::size_hint()` implementation.
Although its implementation is not precise (see TODOs).
@dvdsk
Copy link
Collaborator

dvdsk commented Nov 16, 2024

Love the work here, can not look at it the next few days unfortunately. After the weekend I'll see if I can come up with a benchmark for the resampler. We need one anyway (I want to introduce a highfy resampler in the future). Ill do a detailed review and answer all the questions that came up then.

@PetrGlad
Copy link
Contributor Author

PetrGlad commented Nov 16, 2024

Looking a other filters in rodio I am now curious if rodio should only provide basic functions and use RustAudio/dasp for complex processing instead. Some of the rodio functionality is already implemented there.

As I understand rodio in most cases can use generic parameters to avoid dyn references (but Sync and Mixer still use them). That probably can be slightly more efficient. Any other reason rodio cannot use some existing external libraries for sound processing graph?

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 18, 2024

I am now curious if rodio should only provide basic functions and use RustAudio/dasp for complex processing instead.

Rodio and dasp have fundamentally different goals, that makes it hard and maybe unwise to merge them or require users to be familiar with both.
While rodio is an audio playback crate dasp is a signal processing suite. When Rodio is used its usually not the main feature of the application. The app might be a simple game, ui with sounds on click or podcast app. So Rodio should focus on doing most things efficient while getting out of the way of the user. That is why for example Sink exists. You do not need it, it adds nothing that can not be done with the other parts of rodio. The Sink API does however cover most use-cases and is easy to understand/use. (Its also far from perfect and has its own issues which we are going to address).

Some of the rodio functionality is already implemented there.

Rodio predates dasp, so its easy to turn this around and ask, why do they not depend on rodio. Again its a difference in goals, dasp wants to have zero dependencies, rodio just wants to be easy to use (so no C-deps if possible).

use RustAudio/dasp for complex processing instead

I also think rodio has most features dasp has since #602 landed. I might be mistaken there. And a Source that makes it easy to use dasp from rodio might be interesting.

The sinc interpolator in dasp is interesting, rodio needs an option for a slower but more hifi interpolator. See #584.

Copy link
Collaborator

@dvdsk dvdsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only AGC example and what is possibly a superfluous --doc argument in ci.yml left and then we can merge this 👍

Edit: oh and the extra asserts you added might need a message, if only to make reading the code easier.

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 18, 2024

Still have not found the time for the benchmark, as soon as I've done that ill post the results here and we can see what we do based on them.

src/source/speed.rs Outdated Show resolved Hide resolved
@dvdsk
Copy link
Collaborator

dvdsk commented Nov 21, 2024

benchmarks results from main on my machine:
(see bench/resampler.rs now on master)

If you rebase on (or merge with) main ill rerun them on the PR and we can see if something changed.

 Timer precision: 20 ns
resampler       fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ resample_to                │               │               │               │         │
   ├─ 8000      2.145 ms      │ 2.639 ms      │ 2.158 ms      │ 2.169 ms      │ 100     │ 100
   ├─ 11025     2.404 ms      │ 3.197 ms      │ 2.414 ms      │ 2.423 ms      │ 100     │ 100
   ├─ 16000     3.015 ms      │ 3.096 ms      │ 3.042 ms      │ 3.044 ms      │ 100     │ 100
   ├─ 22050     3.51 ms       │ 3.726 ms      │ 3.524 ms      │ 3.528 ms      │ 100     │ 100
   ├─ 44100     2.301 ms      │ 3.162 ms      │ 2.446 ms      │ 2.527 ms      │ 100     │ 100
   ├─ 48000     6.308 ms      │ 12.88 ms      │ 6.348 ms      │ 6.5 ms        │ 100     │ 100
   ├─ 88200     9.887 ms      │ 10.62 ms      │ 10.21 ms      │ 10.2 ms       │ 100     │ 100
   ├─ 96000     11.56 ms      │ 21.68 ms      │ 11.84 ms      │ 12.19 ms      │ 100     │ 100
   ├─ 176400    19.08 ms      │ 25.25 ms      │ 19.1 ms       │ 19.23 ms      │ 100     │ 100
   ├─ 192000    22.2 ms       │ 22.92 ms      │ 22.23 ms      │ 22.24 ms      │ 100     │ 100
   ├─ 352800    38.78 ms      │ 39.18 ms      │ 38.81 ms      │ 38.83 ms      │ 100     │ 100
   ╰─ 384000    44.21 ms      │ 45.07 ms      │ 44.25 ms      │ 44.28 ms      │ 100     │ 100

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 24, 2024

benchmarks this pr on my machine:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.414 ms      │ 2.701 ms      │ 2.508 ms      │ 2.515 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.671 ms      │ 2.755 ms      │ 2.692 ms      │ 2.693 ms      │ 100     │ 100
   ├─ 11025       2.416 ms      │ 2.682 ms      │ 2.433 ms      │ 2.436 ms      │ 100     │ 100
   ├─ 16000       3.723 ms      │ 3.812 ms      │ 3.775 ms      │ 3.772 ms      │ 100     │ 100
   ├─ 22050       3.505 ms      │ 3.62 ms       │ 3.527 ms      │ 3.53 ms       │ 100     │ 100
   ├─ 44100       2.384 ms      │ 3.039 ms      │ 2.456 ms      │ 2.486 ms      │ 100     │ 100
   ├─ 48000       7.421 ms      │ 7.512 ms      │ 7.435 ms      │ 7.439 ms      │ 100     │ 100
   ├─ 88200       11.22 ms      │ 11.38 ms      │ 11.31 ms      │ 11.31 ms      │ 100     │ 100
   ├─ 96000       12.87 ms      │ 13.2 ms       │ 13.18 ms      │ 13.17 ms      │ 100     │ 100
   ├─ 176400      20.75 ms      │ 20.91 ms      │ 20.85 ms      │ 20.85 ms      │ 100     │ 100
   ├─ 192000      23.73 ms      │ 27.32 ms      │ 23.94 ms      │ 23.99 ms      │ 100     │ 100
   ├─ 352800      40.9 ms       │ 41.99 ms      │ 40.95 ms      │ 41 ms         │ 100     │ 100
   ╰─ 384000      46.4 ms       │ 47.51 ms      │ 46.45 ms      │ 46.48 ms      │ 100     │ 100

main branch:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.294 ms      │ 5.027 ms      │ 2.498 ms      │ 2.591 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.129 ms      │ 2.307 ms      │ 2.151 ms      │ 2.153 ms      │ 100     │ 100
   ├─ 11025       2.386 ms      │ 2.444 ms      │ 2.4 ms        │ 2.402 ms      │ 100     │ 100
   ├─ 16000       3.005 ms      │ 5.783 ms      │ 3.042 ms      │ 3.075 ms      │ 100     │ 100
   ├─ 22050       3.479 ms      │ 4.052 ms      │ 3.49 ms       │ 3.497 ms      │ 100     │ 100
   ├─ 44100       2.283 ms      │ 3.367 ms      │ 2.504 ms      │ 2.574 ms      │ 100     │ 100
   ├─ 48000       6.254 ms      │ 7.076 ms      │ 6.26 ms       │ 6.272 ms      │ 100     │ 100
   ├─ 88200       9.795 ms      │ 20.01 ms      │ 10.09 ms      │ 10.24 ms      │ 100     │ 100
   ├─ 96000       11.52 ms      │ 21.97 ms      │ 11.53 ms      │ 11.74 ms      │ 100     │ 100
   ├─ 176400      19.02 ms      │ 19.15 ms      │ 19.03 ms      │ 19.03 ms      │ 100     │ 100
   ├─ 192000      22.13 ms      │ 22.26 ms      │ 22.14 ms      │ 22.15 ms      │ 100     │ 100
   ├─ 352800      38.63 ms      │ 38.76 ms      │ 38.65 ms      │ 38.66 ms      │ 100     │ 100
   ╰─ 384000      44.03 ms      │ 44.17 ms      │ 44.06 ms      │ 44.06 ms      │ 100     │ 100

So that's a 5 to 10% slowdown. A bit too much. Lets see what causes it

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 24, 2024

with the try from in sample.rs replaced with a cast performance increases a tiny bit however it does not go back to what it was.

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.378 ms      │ 2.863 ms      │ 2.412 ms      │ 2.441 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.651 ms      │ 2.736 ms      │ 2.681 ms      │ 2.681 ms      │ 100     │ 100
   ├─ 11025       2.374 ms      │ 2.436 ms      │ 2.395 ms      │ 2.397 ms      │ 100     │ 100
   ├─ 16000       3.665 ms      │ 3.826 ms      │ 3.764 ms      │ 3.762 ms      │ 100     │ 100
   ├─ 22050       3.461 ms      │ 5.185 ms      │ 3.501 ms      │ 3.53 ms       │ 100     │ 100
   ├─ 44100       2.377 ms      │ 2.83 ms       │ 2.4 ms        │ 2.428 ms      │ 100     │ 100
   ├─ 48000       7.217 ms      │ 13.47 ms      │ 7.375 ms      │ 7.502 ms      │ 100     │ 100
   ├─ 88200       10.86 ms      │ 15.74 ms      │ 11.03 ms      │ 11.1 ms       │ 100     │ 100
   ├─ 96000       12.44 ms      │ 15.84 ms      │ 12.9 ms       │ 12.94 ms      │ 100     │ 100
   ├─ 176400      20.03 ms      │ 20.73 ms      │ 20.51 ms      │ 20.36 ms      │ 100     │ 100
   ├─ 192000      23.3 ms       │ 23.87 ms      │ 23.44 ms      │ 23.45 ms      │ 100     │ 100
   ├─ 352800      39.94 ms      │ 40.4 ms       │ 40.12 ms      │ 40.13 ms      │ 100     │ 100
   ╰─ 384000      44.42 ms      │ 45.61 ms      │ 44.47 ms      │ 44.59 ms      │ 100     │ 100

this pr with changes to sample.rs reverted, again matches main branch in perf:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.304 ms      │ 2.825 ms      │ 2.429 ms      │ 2.447 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.123 ms      │ 2.217 ms      │ 2.17 ms       │ 2.168 ms      │ 100     │ 100
   ├─ 11025       2.372 ms      │ 4.046 ms      │ 2.387 ms      │ 2.408 ms      │ 100     │ 100
   ├─ 16000       2.982 ms      │ 3.104 ms      │ 3.035 ms      │ 3.03 ms       │ 100     │ 100
   ├─ 22050       3.486 ms      │ 6.186 ms      │ 3.497 ms      │ 3.525 ms      │ 100     │ 100
   ├─ 44100       2.287 ms      │ 2.83 ms       │ 2.384 ms      │ 2.423 ms      │ 100     │ 100
   ├─ 48000       6.27 ms       │ 6.387 ms      │ 6.279 ms      │ 6.282 ms      │ 100     │ 100
   ├─ 88200       9.841 ms      │ 10.78 ms      │ 10.14 ms      │ 10.15 ms      │ 100     │ 100
   ├─ 96000       11.77 ms      │ 13.12 ms      │ 11.8 ms       │ 11.85 ms      │ 100     │ 100
   ├─ 176400      19.11 ms      │ 20.49 ms      │ 19.14 ms      │ 19.17 ms      │ 100     │ 100
   ├─ 192000      22.23 ms      │ 23.51 ms      │ 22.26 ms      │ 22.28 ms      │ 100     │ 100
   ├─ 352800      38.82 ms      │ 41.43 ms      │ 38.84 ms      │ 38.87 ms      │ 100     │ 100
   ╰─ 384000      44.25 ms      │ 45.22 ms      │ 44.28 ms      │ 44.31 ms      │ 100     │ 100

The switch from i32 to i64 is probably to blame. Using i32 might enable the compiler to emit better/any SIMD. We won't know without looking at the assembly.

We could keep the try_into.expect(...) bit however lets make it only use that on a debug build. Do we absolutely need the change from i32 to i64?

@PetrGlad
Copy link
Contributor Author

@dvdsk Security-wise there is no strict need in checked cast but u32 * u32 may overflow in some cases. AFAIK overlapped value will wrap over zero in release build, and panic in debug mode. I'd expected some clicks or distortion in case of overflows and panics in debug mode (which was triggered by quickcheck). The final interpolated value should be in range, though.

One could use smaller integers for the ratios (I doubt u32 precision is actually necessary there), or use f32 as interpolation coefficient. I think smaller integers, like u16 can be an approximation, maybe even something crude, like dropping some least significant bits in the u32 numerator and divider when one of those has significant bits beyond 16 position. Or finding some other approximate value pair.

@PetrGlad
Copy link
Contributor Author

@dvdsk Security-wise there is no strict need in checked cast but u32 * u32 may overflow in some cases. AFAIK overlapped value will wrap over zero in release build, and panic in debug mode. I'd expected some clicks or distortion in case of overflows and panics in debug mode (which was triggered by quickcheck). The final interpolated value should be in range, though.

One could use smaller integers for the ratios (I doubt u32 precision is actually necessary there), or use f32 as interpolation coefficient. I think smaller integers, like u16 can be an approximation, maybe even something crude, like dropping some least significant bits in the u32 numerator and divider when one of those has significant bits beyond 16 position. Or finding some other approximate value pair.
E.g. num crate provides such algorithm:

let ratio = num::rational::Rational32::approximate_float(2.3f32).unwrap();
dbg!(ratio.numer(), ratio.denom());

Maybe making this more reliable can be a separate task...

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 28, 2024

I'd expected some clicks or distortion in case of overflows and panics in debug mode (which was triggered by quickcheck).

might be related to #584, I thought that was a property of the linear interpolator and could not be fixed. You might have stumbled on the fix for it 🎉

Maybe making this more reliable can be a separate task...

I think that is the best, not only can we try fixing #584 then but we might also look at how to enable users to pick their own re-sampler and introduce more hifi alternatives like rubato or dasps sinc resampler. That should be pretty easy, I think an extra option on the OutputStream builder?

On a separate note, how would you feel about joining me as maintainer of rodio? You seem to know your stuff and we can sure use the help.

@PetrGlad
Copy link
Contributor Author

Yes, I would like to help. I cannot promise much, but I do have some free time for this.

@PetrGlad
Copy link
Contributor Author

PetrGlad commented Nov 29, 2024

Regarding interpolation, now I think I understand it better. I looks like there are only some specific combinations of conversion rates where it can overflow. I have removed the explicit overflow check from interpolation and tried to clarify the limitations the docs.
Regarding #584, it looks like the output sample is converted from 2400Hz to 44100Hzand the above overflows should not occur in this case.

@PetrGlad
Copy link
Contributor Author

Well resampler is just another filter, yes output stream builder can have this option.

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 29, 2024

CI wants another cargo fmt run 😛. Meanwhile I'll run a quick benchmark and report changes

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 29, 2024

tldr: Same kind of perf regression as before. Pretty strange, I would expect perf to be the same now I'll hunt for the cause tomorrow.

this pr:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.399 ms      │ 2.795 ms      │ 2.494 ms      │ 2.505 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.137 ms      │ 2.385 ms      │ 2.187 ms      │ 2.197 ms      │ 100     │ 100
   ├─ 11025       2.377 ms      │ 2.436 ms      │ 2.392 ms      │ 2.395 ms      │ 100     │ 100
   ├─ 16000       3.085 ms      │ 3.173 ms      │ 3.12 ms       │ 3.122 ms      │ 100     │ 100
   ├─ 22050       3.501 ms      │ 5.609 ms      │ 3.52 ms       │ 3.543 ms      │ 100     │ 100
   ├─ 44100       2.432 ms      │ 3.042 ms      │ 2.557 ms      │ 2.585 ms      │ 100     │ 100
   ├─ 48000       6.456 ms      │ 6.505 ms      │ 6.476 ms      │ 6.477 ms      │ 100     │ 100
   ├─ 88200       10.71 ms      │ 10.82 ms      │ 10.75 ms      │ 10.75 ms      │ 100     │ 100
   ├─ 96000       12.14 ms      │ 14.13 ms      │ 12.18 ms      │ 12.21 ms      │ 100     │ 100
   ├─ 176400      20.01 ms      │ 23.26 ms      │ 20.07 ms      │ 20.12 ms      │ 100     │ 100
   ├─ 192000      22.95 ms      │ 26.81 ms      │ 23.02 ms      │ 23.16 ms      │ 100     │ 100
   ├─ 352800      40.27 ms      │ 40.79 ms      │ 40.4 ms       │ 40.4 ms       │ 100     │ 100
   ╰─ 384000      45.8 ms       │ 46.39 ms      │ 45.94 ms      │ 45.95 ms      │ 100     │ 100

main:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.283 ms      │ 2.75 ms       │ 2.392 ms      │ 2.426 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.342 ms      │ 2.423 ms      │ 2.356 ms      │ 2.36 ms       │ 100     │ 100
   ├─ 11025       2.518 ms      │ 2.88 ms       │ 2.533 ms      │ 2.54 ms       │ 100     │ 100
   ├─ 16000       3.207 ms      │ 3.283 ms      │ 3.25 ms       │ 3.248 ms      │ 100     │ 100
   ├─ 22050       3.599 ms      │ 3.635 ms      │ 3.613 ms      │ 3.614 ms      │ 100     │ 100
   ├─ 44100       2.326 ms      │ 3.04 ms       │ 2.48 ms       │ 2.51 ms       │ 100     │ 100
   ├─ 48000       6.368 ms      │ 6.407 ms      │ 6.384 ms      │ 6.384 ms      │ 100     │ 100
   ├─ 88200       9.995 ms      │ 10.83 ms      │ 10.28 ms      │ 10.28 ms      │ 100     │ 100
   ├─ 96000       11.92 ms      │ 12 ms         │ 11.95 ms      │ 11.95 ms      │ 100     │ 100
   ├─ 176400      19.33 ms      │ 36.44 ms      │ 19.4 ms       │ 19.57 ms      │ 100     │ 100
   ├─ 192000      22.5 ms       │ 22.71 ms      │ 22.57 ms      │ 22.57 ms      │ 100     │ 100
   ├─ 352800      39.32 ms      │ 54.21 ms      │ 39.43 ms      │ 39.67 ms      │ 100     │ 100
   ╰─ 384000      44.85 ms      │ 53.31 ms      │ 44.98 ms      │ 45.07 ms      │ 100     │ 100

@@ -83,10 +77,13 @@ where
(first, next)
};

// Reducing nominator to avoid numeric overflows during interpolation.
let (to, from) = Ratio::new(to, from).into_raw();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the Ratio source, and this does the same as the original code that did from: from/gcd and to: to/gcd. See: https://docs.rs/num-rational/latest/src/num_rational/lib.rs.html#108-112. Though I had now idea why they did that until now.

Given that what is the motivation for using Ratio over / gcd?

Copy link
Contributor Author

@PetrGlad PetrGlad Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also used in tests, to limit input values, I thought it makes sense to re-use that and have a bit less code to test. Even if this is an extra call it happens only once at the converter creation.

@PetrGlad
Copy link
Contributor Author

Is there a way to compare the benchmark snapshots? This is the part I liked in criterion that it tries to clearly show changes between versions. Although its API is more cumbersome.

@PetrGlad
Copy link
Contributor Author

PetrGlad commented Nov 29, 2024

Cargo.lock should be in git somewhere, at least for benches and tests. It is necessary for builds to be reproducible. Projects that have rodio as a dependency will ignore it anyway. I see other Rust library projects do track it.
Alternatively benches can be in a separate binary which will have a Cargo.lock.

It is necessary to make tests and benches reproducible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants